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Abstract: We define and study an inference algorithm based on "belief propagation" 
(BP) and the Bethe approximation. The idea is to encode into a graph an a priori 
information composed of correlations or marginal probabilities of variables, and to 
use a message passing procedure to estimate the actual state from some extra real- 
time information. This method is originally designed for traffic prediction and is 
particularly suitable in settings where the only information available is floating car 
data. We propose a discretized traffic description, based on the Ising model of 
statistical physics, in order to both reconstruct and predict the traffic in real time. 
General properties of BP are addressed in this context. In particular, a detailed 
study of stability is proposed with respect to the a priori data and the graph topology. 
The behavior of the algorithm is illustrated by numerical studies on a simple traffic 
toy model. How this approach can be generalized to encode superposition of many 
traffic patterns is discussed. 
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Propagation de croyances et approximation de Bethe 
pour la prediction de trafic 



Resume : On definit et etudie un algorithme de reconstruction utilisant l'algorithme 
« Belief Propagation » (propagation de croyances, BP) et l'approximation de Bethe. 
L'idee est d'encoder dans un graphe des donnees a priori composees de correla- 
tions ou de lois marginales et d'utiliser une procedure de passage de messages pour 
estimer l'etat reel a partir d'informations temps-reel. Cette methode, developpee 
pour des besoins de prediction de trafic, est particulierement adaptee au cas ou la 
seule information disponible provient de vehicules sonde (Floating Car Data). Nous 
proposons une discretisation binaire du trafic s'appuyant sur le modele d'Ising de 
physique statistique, permettant de reconstruire et de predire le trafic en temps 
reel. Des proprietes generales de l'algorithme BP sont discutees dans ce contexte. 
En particulier une etude detaillee des proprietes de stabilite fonction des donnees a 
priori et de la topologie du graphe est fournie. Une etude numerique sur un modele 
de trafic simplifie permet d'illustrer le fonctionnement de l'algorithme. La fagon 
de generaliser cette approche pour encoder une superposition de plusieurs etats de 
trafic est discutee. 

Mots-cles : propagation de croyances, approximation de Bethe, reconstruction de 
trafic, prediction, systemes de transport intelligent, vehicules traceurs 
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1 Introduction 

With an estimated 1% GDP cost in the European Union (i.e. more than hundred 
billions euros) , congestion is not only a time waste for drivers and an environmental 
challenge, but also an economic issue. This is why the European commission financed 
the REACT project, where new traffic prediction models have been developed. These 
predictions are to be used to inform the public and possibly to regulate the traffic. 

Today, some urban and inter-urban areas have traffic management and advice 
systems that collect data from stationary sensors, analyze them, and post notices 
about road conditions ahead and recommended speed limits on display signs located 
at various points along specific routes. However, these systems are not available 
everywhere and they are virtually non-existent on rural areas. With rural road 
crashes accounting for more than 60% of all road fatalities in OECD (Organization 
for Economic Cooperation and Development) countries, the need for a system that 
can cover these roads is compelling if a significant reduction in traffic deaths is to 
be achievable. 

The REACT project combines a traditional traffic prediction approach on equipped 
motorways with an innovative approach on non-equipped roads. The idea is to obtain 
floating car data from a fleet of probe vehicles and reconstruct the traffic conditions 
from this partial information. To understand why it is not possible to fuse these two 
parts, we have to go a bit more into prediction algorithms details. 

Two types of approaches are usually distinguished, namely data driven (applica- 
tion of statistical models to a large amount of data, for example regression analysis) 
and model based (simulation or mathematical models explaining the traffic patterns). 
As we stated before, the choice is largely led by the availability of data. In our case, 
since little data is available on non-equipped roads (only the equipped vehicles driv- 
ing along the observed roads), the model driven approach is the only feasible one. For 
more information about traffic prediction methods, we refer the reader to [HHSUl]. 

Most current traffic models are deterministic, described either at a macroscopic 
level by a set of differential equations linking variables such as flow and density, 
or by Newton's law at a microscopic level where each individual car is considered. 
Intermediate descriptions are essentially kinetic models, like for example cellular 
automata [TT] , which are very well adapted to freeway traffic modeling and adapted 
to some extent to urban traffic modeling [3]. Traffic flow models are quite adapted 
and efficient on motorways where fluid approximation of the traffic is reasonable; 
they tend to fail for cities or rural roads. The reason is that the velocity flow 
field is subject to much greater fluctuations induced by the nature of the network 
(presence of intersections and short distance between two intersections) than by the 
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traffic itself. These fluctuations are both spatial and temporal (a red or green traffic 
light at a cross-road, a road- work, etc). There is no local stationary regime for the 
velocity, the dynamics are dominated by the fluctuations. 

We propose in this paper a hybrid approach in the continuation of [5], by taking 
full advantage of the statistical nature of the information, in combination with a 
stochastic modeling of traffic patterns. In order to reconstruct the traffic and make 
predictions, we propose a model — the Bethe approximation (BA) — to encode the 
statistical fluctuations and stochastic evolution of the traffic and an algorithm — the 
belief propagation (BP) algorithm — to decode the information. Those concepts are 
familiar to the computer science and statistical physics communities since it was 
shown [16] that the output of BP is in general the Bethe approximation [3] . 

The paper is organized as follows: Section [2] describes the model and its rela- 
tionship to the Ising model and the Bethe approximation. The inference problem 
and our strategy to tackle it using the Belief Propagation algorithm are stated in 
Section El The implementation of these ideas requires some new results about the 
BP algorithm, which are the subject of Section this concerns in particular the 
effect of the normalization of the messages, the parameterization of the model and 
the stability of the fixed points. Section[5]is devoted to implementation details of the 
decoding algorithm and to some numerical results illustrating the method. Finally, 
some new research directions are proposed in Section [6j 

2 Traffic description and statistical physics 

The graph onto which we apply the belief propagation procedure is made of space- 
time vertices that encode both a location (road link) and a time (discretized on a 
few minutes scale). More precisely, the set of vertices is V = C ® Z + , where L 
corresponds to the links of the network and Z + to the time discretization. To each 
point a = (£,t) G V, we attach an information r a € {0, 1} indicating the state of 
the traffic (1 if congested, otherwise). Each cell is correlated to its neighbors (in 
time and space) and the evaluation of this local correlation determines the model. 
In other words, we assume that the joint probability distribution of ry = {r a , a £ 
V} € {0, 1} V is of the form 

p({r a ,aeV}) = Y[ 4>a(T a ) Y{ ^(r^rp) (2.1) 
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Fig. 2.1: Traffic network (a) and Ising model (b) on a random graph 

where £ C V 2 is the set of edges, and the local correlations are encoded in the 
functions ip and <f>. V together with £ describe the space-time graph Q and V(a) C V 
denotes the set of neighbors of vertex a. 

The model described by (|2.ip is actually equivalent to an Ising model [8] on Q, 
with arbitrary coupling between adjacent spins, the up or down orientation of each 
spin indicating the status of the corresponding link (Figure I2.ip . 

The homogeneous Ising model (uniform coupling constants) is a well-studied 
model of ferro (positive coupling) or anti-ferro (negative coupling) material in sta- 
tistical physics. It displays a phase transition phenomenon with respect to the value 
of the coupling. At weak coupling, only one disordered state occurs, where spins 
are randomly distributed around a mean-zero value. Conversely, when the coupling 
is strong, there are two equally probable states that correspond to the onset of a 
macroscopic magnetization either in the up or down direction: each spin has a larger 
probability to be oriented in the privileged direction than in the opposite one. 

From the point of view of a traffic network, this means that such a model is able 
to describe three possible traffic regimes: fluid (most of the spins up), congested 
(most of the spins down) and dense (roughly half of the links are congested). For 
real situations, we expect other types of congestion patterns, and we seek to associate 
them either to the p-state Potts model if we extend the binary to p-ary description, 
or to the possible states of an inhomogeneous Ising model with frustration (i.e. 
with possibly negative coupling parameters), referred as spin glasses in statistical 
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physics |lUj . When such a system is frustrated because some negative couplings, 
leading to a certain number of contradictions, a proliferation of meta-stable states 
occurs, which eventually scales exponentially with the size of the system. 

On a simply connected graph, the knowledge of the one-vertex and two-vertices 
marginal probabilities is sufficient [12J to fully determine the measure (|2.ip . 



where tfo, denotes the number of neighbors of a. Since our space time graph Q is 
multi-connected, this relationship between local marginals and the full joint prob- 
ability measure can only be an approximation, which in the context of statistical 
physics is referred to as the Bethe approximation. This approximation is provided 
by the minimum of the so-called Bethe free energy, which, based on the form 
is an approximate form of the Kullback-Leibler distance, 



^(rv)||p(r V )) d = f ^6(r v )ln^. 



and which rewrites in terms of a free energy as 

D(b(r v )\\p(r v )) = HKrv)) ~ Hp(tv)), 

where 

HKrv)) =U(b(r v ))-S(b(r v )), (2.3) 
with the respective definitions of the energy U and of the entropy S 

U(b{T V )) = - ^ b a/3(r a , Tp) log Tp a p(T a ,Tf3) b a (T a ) log (j) a (T a ), 

(a,/3)e£ «€V 

S(b(T V )) = - b a/ 3(T a , rp) log b a p(r a , rp) - b a (r a ) log 6 a (r a ). 

(a,p)e£ aev 

In practice, what we retain from an Ising description is the possibility to encode 
a certain number of traffic patterns in a statistical physics model. This property is 
actually shared also by the Bethe Approximation (BA) and this is the reason for us 
to directly encode the traffic patterns in a BA rather than the inhomogeneous Ising 
model itself, based on historical data, and to avoid therefore an intermediate approx- 
imation step. BA simply provides us with a set of marginals probabilities that we 
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try to match with the historical data. But this set, which is the result of an iterative 
procedure, is not necessarily unique (see for example [9]) and the proliferation of 
possible solutions depends on the frustration induced by the historical correlations 
used to define the ^'s of (12, ip . The setting of our model consists therefore into an 
optimization procedure of the matching between the set of historical values obtained 
from probe vehicles and the set of marginal probabilities of the BA. 

The data collected from the probe vehicles is used in two different ways. The 
most evident one is that the data of the current day directly influences the prediction. 
In parallel, this data is collected over long periods (weeks or months) in order to 
estimate the model (12. ip . Typical historical data that is accumulated is 

• Pa(T a )- the probability that vertex a is congested (r Q = 1) or not (r a = 0); 

• p a p{ T a-,Tp): the probability that a probe vehicle going from a to (3 G V(q) 
finds a with state r a and (3 with state tr. 

The edges (a, (3) of the space time graph Q are constructed based on the presence of a 
measured mutual information between a and /3, which is the case when p a n{T a , tr) ^ 

Paij a )pp{Tp). 

3 The reconstruction and prediction algorithm 
3.1 Statement of the inference problem 

We turn now to our present work concerning an inference problem, which we set 
in general terms as follows: a set of observables ry = {r a , a G V}, which are 
stochastic variables are attached to the set V of vertices of a graph. For each edge 
(a, P) € £ of the graph, an accumulation of repetitive observations allows to build 
the empirical marginal probabilities {p a p}- The question is then: given the values 
of a subset Ty. = {r a », a* G V*}, what prediction can be made concerning V*, the 
complementary set of V* in V? 
There are two main issues: 

• how to encode the historical observations (inverse problem) in an Ising model, 
such that its marginal probabilities on the edges coincide with the p a /3^ 

• how to decode in the most efficient manner — typically in real time — this infor- 
mation, in terms of conditional probabilities P(r Q |Ty*)? 

The answer to the second question will somehow give a hint to the first one. 
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3.2 The belief propagation algorithm 

BP is a message passing procedure, which output is a set of estimated marginal prob- 
abilities, the beliefs b a p |12j . The idea of the BP algorithm is to factor the marginal 
probability at a given site in a product of contributions coming from neighboring 
sites, which are the messages. The messages sent by a vertex a to (3 G V(a) depends 
on the messages it received previously from other vertices: 

ma^t^) ^ n a ^f3(T a )(j) a (T a )tp a /3(T a ,Ti3), (3.1) 

r Q e{o,i} 

where 

n a ^0(T a ) = Yl m 7 ^ Q (r Q ). (3.2) 

7 eV(a)\{/3} 

In practice, the messages will be normalized so that 

^2 771^/3(7/3) = 1. (3.3) 

^€{0,1} 

We will come back to the effects of this in Section 14.21 

The output of the algorithm is a set of beliefs, which are an approximation of 
the one- vertex and two- vertices marginals of p(ry). The beliefs b a are reconstructed 
according to 

b a {T a ) oc 4> a {T a ) Yl nip^ a (T a ), (3.4) 

f3eV{a) 

and, similarly, the belief b a p of the joint probability of (r a ,r^) is given by 

b a /3(T a ,Tf3) OC n a ^f3(T a )n/3^ a (Tf3) X ^(Ta)^^)^ (T a , Tp) . (3.5) 

In the formulas above and in the remainder of this paper, the proportionality symbol 
oc indicates that one must normalize the beliefs so that they sum to 1. 

A simple computation shows that equations (|3.4p and (|3.5p are compatible, since 
d37JJ)d3l2l) imply that 

Y b a p{T a ,Tp) = 6/3(7/3). 

r a e{o,i} 

It has been realized a few years ago [T3] that the fixed points of the BP algorithm 
coincide with local minima of the Bethe free energy (12. 3|) . This justifies that we can 
use this algorithm to approximate our Ising model. 

We propose to use the BP algorithm for two purposes: estimation of the model 
parameters (the functions cp and ip) from historical data and reconstruction of traffic 
from current data. 
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3.3 Setting the model with Belief Propagation 

The fixed points of the BP algorithm (and therefore the Bethe approximation) allow 
to approximate the joint marginal probability p a g when the functions tjj a p and <f> a 
are known. Conversely, it can provide good candidates for ip a p and 4> a from the 
historical values p a p and p a . 

To set up our model, we are looking for a fixed point of the BP algorithm 
satisfying (|3.1|) - (|3.2|) and such that b a p(r a , rg) = p a f3(T a ,Tp) and therefore b a (T a ) = 

It is easy to check that the following choice of eft and ip, 

, , \ J5 a/3 (r a ,r^) 

1pa(3{T a ,T(3) = — , (3.6) 

<t>a{T a ) = p a (r a ), (3.7) 

leads (|2.ip to coincide with (|2.2p . They correspond to a normalized BP fixed point 
for which all messages are equal to 1/2. There is however no guarantee that this 
fixed point is a stable fixed point; actually, for an Ising-type system below the 
critical temperature, we often observe that this point is unstable (see Section [5] for 
the simulation results). It will be shown however in Section f4. II that this form of (f) 
and t/j is in some sense canonical. 

3.4 Traffic reconstruction and prediction 

Let V* be the set of vertices that have been visited by probe vehicles. Reconstruct- 
ing traffic from the data gathered by those vehicles is equivalent to evaluating the 
conditional probability 

Pa[T a \T V *) = , 

Pv{tv*) 

where ry* is a shorthand notation for the set {To,»} Q . e v*- 

The BP algorithm applies to this case if a specific rule is defined for vertices 
a* G V*: since the value of r a * is known, there is no need to sum over possible 
values and (|3.ip becomes 

The resulting algorithm is supposed to be run in real time, over a graph which 
corresponds to a time window (typically a few hours) centered around present time, 
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with probe vehicle data added as it is available. In this perspective, the reconstruc- 
tion and prediction operations are done simultaneously on an equal footing, the 
distinction being simply the time-stamp (past for reconstruction or future for pre- 
diction) of a given computed belief. The output of the previous run can be used as 
initial messages for a new run, in order to speedup convergence. Full re-initialization 
(typically a random set of initial messages) has to be performed within a time interval 
of the order but smaller than the time-scale of typical traffic fluctuations. 



4 Some general properties of the Belief Propagation algorithm 

This section contains several theoretical results on the BP algorithm. Although they 
are stated in the context of Section [21 most of these results can be trivially extended 
to a general factor graph and variables taking more than two values (transforming 
the Ising model into a Potts model) , except possibly for Section 14.31 



4.1 Building the model from its fixed points 

The particular use that we make of the Bethe approximation, as outlined in Sec- 
tion [331 means that the output of the algorithm takes precedence over the underlying 
Ising model, which is an unusual situation. The following propositions shows how 
to estimate (f) a and ip a p from the historical values p a and p a p. 

Let us start with a direct consequence of the BP fixed point equations. The fol- 
lowing straightforward proposition extends (|2.2p to the case of a non-tree structure. 



Proposition 4.1. A set of beliefs {b a , b a p} corresponding to a BP fixed point of \3.1 
l\3. 51) always satisfies 

f \ Ila,l3 b al3(T a ,Ti3) b a/3 (T a , Tp) 

pM = rL*-v.) = 3 bM Me WW ■ 

Proof. This is a simple consequence of (|3.4p — (|3.5|) . 



What this proposition means is that different BP fixed points correspond to 
different factorizations of the joint measure (|2.ip . The knowledge of a set of beliefs 
is thus sufficient to determine the underlying Ising model and consequently the other 
fixed points of the algorithm. 

The following proposition gives more insight on how the different components 
of (|2.ip can be written in terms of the BP fixed points. 
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Proposition 4.2. Assume that there exists a fixed point of the BP algorithm satisfying 
<TO] ) - |g75|) and such that 

(4.1) 



b a /3(r a ,Tp) = Pa(3(T a ,Tp), and therefore b a (T a ) = p a (T a ) 
Then the following equalities hold 



Pa(T a )pf3(T(3 
Pa(T a 



-ma-*p{Tp)mp-> a {T Q 



ri/36V(a) m ^( r a)' 

Conversely, assume that there exist boolean functions ii a fi(jp) such that 

Pa0(T a ,T/3) 



(f>a(r a ) 



Pa(T a ) 



Then m a ^/3 = [i a p is a fixed point of the BP algorithm and f^, 1\ ) holds. 
Proof. Relation (|4.2|) is obtained by rewriting (|3.4|) and (|3.5|) as 



(4.2) 
(4.3) 

(4.4) 
(4.5) 



<t>a{T a )n a -+p{T a )np-> a (Tp)4>p{Tp) 
b a p(Ta,Tp) 



To prove the second assertion, the first step is to show that /J, a p is a BP fixed 



point: 



r a e{0,l} L 7 GV(a)\{/3} J 

Pa(T a ) P a p{T a ,Tp) 



E 



Pa/3(r a ,T/3) 



E 



For this fixed point, f)3.5j) reduces to (|4.ip . which concludes the proof of the propo- 
sition. ■ 
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While Proposition 14.21 seems to indicate that there is some leeway in choosing 
ipa/3, a proper change of variables shows that all the choices are equivalent. Let us 
define the following new set of messages 



*F>{Tp) 



Equation (|3.ip then becomes 

X a -+p{Tp)lL a p{Tp) 

L 7 eV(a)\{/3} 

Pa(r a ) p a /3(T a ,Ti3 



E 

T a e{o,i} 



E 

T Q €{0,1} 

E 

r a G{0,l} 



W ^7^a (j~a ) 
■ 7 eV(a)\{/3} 



</>a(Ta)lJj a p(T a ,Ti3) 

Ma/3( T /3Wa( T a) 



n 



J 7— >« 



(To) 



■ 7 eV(a)\{/3} 



Vf3a(T a ) Pa(T a )Pl3(Tl3) 
Pa/3(r a ,T/3) 



and therefore 



r Q e{0,l} L 7eV(a)\{/3} 



Paf3(Tg,Tp) 



This version of the BP algorithm is thus equivalent to the heuristic choice (|3.6|) - 
(13. ZD , which corresponds to the trivial fixed point x a -+p(rp) = 1. 

Since it is equivalent in terms of convergence to the original choice of ifi a g and 
4> a , this can be seen as the canonical choice of functions to define our Ising model. 

The freedom we have in the definition of <f> and ip yields the following possibility: 

Proposition 4.3. Assume that the schema /13.1\) - [3T^) admits a set {m 1 }, i € X, of 
fixed points with corresponding beliefs {b 1 }. For any i$ £ T, choosing iq as a reference 
state by changing (j) and tp according to 
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yields a new BP scheme, with unchanged beliefs {b 1 }, but with a new set of fixed 
points 

(i/io)/ N m «V T l) 

In particular, the new reference fixed point {m(* 0//i °)} has all its components identi- 
cally equal to 1. 

4.2 Normalization and fixed points 

We discuss here a feature of the algorithm which did not get that much atten- 
tion in the literature, which is the possibility of normalizing the messages and its 
consequences on the results. In most studies, it is assumed that the messages are 
normalized so that (|3.3p holds. The update rule (|3.ip indeed indicates that there 
is an important risk to see the messages converge to or diverge to infinity. It is 
however not immediate to check that the normalized version of the algorithm has 
the same fixed points as the original one (and therefore the Bethe approximation). 
In order to make the definition of normalization clear, define the mapping 



Q a p{m)(rp) = ^ Yi m 7-K*( T a) 



r a e{o,i} 



■ 7 eV( a )\{/3} 



(kaija^apij^Tp), 



Then the normalized version of BP is defined by the following update rule 

^ e^(m)(0) + 9^(^(1)- (47) 

The relation between the fixed points of BP and normalized BP can be described 
as follows. 

Proposition 4.4. Any normalized fixed point (except 0) of the BP algorithm is a fixed 
point of the version of BP algorithm with normalized messages. 

Conversely, a fixed point of the BP algorithm with normalized messages corre- 
sponds (through multiplication by a proper constant) to an unique fixed point of the 
basic BP algorithm, except possibly when the graph Q has exactly one cycle. 

Proof. Let m be a non-null fixed point of the BP algorithm, that is 

m a ^p(rp) = Q a p(m)(Tp), V(a,/3) G £ 
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and let 



m a ^p[Tp) 



m a ^j3{U) + m a ^p{L) 



From its definition, ® a p is multilinear and 



7 gV(a)\{/?} 



and therefore m is a fixed point of the schema (|4.7|) . 

Conversely, let m be a fixed point of (|4.7jl . Then there exists a set of constants 
K a p satisfying 

® a p(™){Tp) = K a pm a -+p{Tp). 
Let us find a set of constants c a p such that 

m a ->p(Tp) = c a pm a ^p(Tp), 

be a non-zero fixed point of (|3.1|) . We have 



n 



■ 7 €V(a)\{/3} 



n 



--7 a 



■ 7 GV(a)\{/3} 
1 



Cap 



n 



0q/3(^)(^) 

iT Q/ 3 rh a ^p(Tp) 
K a p m a ^p(Tp), 



-"yet 



7 GV(a)\{/3} 



and therefore 



log c a/3 



(4.8) 



^2 lo E,Cya = log K a p. 

7£V(a)\{/3} 

Solving this equation amounts to invert a matrix I — A where A is an incidence 
matrix on the dual factor graph (the graph which connects oriented pairs (a, (3) £ £, 
see Figure 15. ip . Let v a p = log c a p . The homogeneous equation rewrites 



v a p + vp a = ^2 



(4.9) 



7 eV(a) 
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When a non-zero solution exists, then a simple symmetry argument shows that the 
right-hand side does not depend on either a or (3 and therefore can be set to 1 
without loss of generality. Therefore, summing over all oriented edges, 

2\£\ = 2 ^ ( v af3 + Vj3a) 

= J2 Yl v *p + J2 Yl 

a£V /3eV(a) /3GVagV(/3) 

= 2|V|, 

with \£\ and |V| respectively the number of edges and vertices. Since Q has only 
one component, by the well-known formula [2J, the number of cycles in the graph 
is \£\ — |V| + 1, only graphs with one cycle give possibly rise to a non-zero solution 
to (|4.9p . Conversely, when a graph has one unique cycle, it is possible to provide an 
partial ordering of vertices such that each vertex has exactly one neighbor greater 
than itself, and v a p = l{ Q>j g} is a solution to (|4.9p . ■ 

This proposition does not describe what happens when Q has exactly one cycle. 
The existence of a solution to (|4.8[) actually depends on the value of log -RT, which 
itself depends on the fixed point rh. However, since BP is known to converge in a 
finite number of steps for graphs with at most 1 cycle, normalization is not useful 
in this situation. 

From now on reference to the BP algorithm is to be understood as its normalized 
version. 



4.3 Stability of BP fixed points 

The next issue to tackle regarding the fixed points of BP is their stability. The 
following definition of conditional belief will be useful 

, / | >. def b a g(r a , Tg) 
0/3 ( r /3) 

For the general case we have the following 

Proposition 4.5. The stability of any fixed point of the BP algorithm is determined 
by the set of beliefs {b} of that fixed point: the fixed point is stable if, and only if, 
the matrix defined, for any pair of oriented edges (a, ff) € £, (a',/3') € £, by the 
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elements 

J af = ( 6 a/j(!|l) - b "/3( 1 l°)) IL {a'eV(a)\{/3}, 0>=a} ^ ^ 

= (l - &a/?(0|l) - 6 Q!( g(l|0))l{ a / 6 v(a)\{/3}, /3'=a}, 

has a spectral radius smaller than 1. 

A sufficient condition for this stability is therefore 

IM 1 ! 1 ) - Mi|o)| < —37, /° r ^«£V,/?£ V(a). 

Proof. Since we are dealing with binary variables, messages are vectors with two 
components, and it is easier to set 

dcf m a ->p(l) 

This normalization is equivalent to the one proposed in Section T4.21 according to 
the change of variables 

m a — /3(0) = — — and m a ^p(l) = Va ^ 13 , 

1 + 1]a-,p 1 + Va-tP 

and the scaled BP algorithm update rule (|4.7|) can be rewritten as 

^ M ! 1 ) + [n 7 gV( a )\{/3} ? ?7^] b «/3( 1 l 1 ) f4 . 

^ ^ M°l°) + [II 76 V(a)\{/3} %-a]Ml|0) ' 

after performing the change of referential of Proposition 14.31 with reference point 
{b}. We look for small perturbations around the fixed point T] a -^p = 1 for all (a, (3). 
The Jacobian at the point rj = 1 reads: 



Oi], 



'a/3 



dr) a >f3> 



= {b a p(l\l) - & a /j(l|0))l/ a / 6 v(c*)\{ 1 8}, /3'=a\, 

rj=l 



which proves H4. 10H . The rest of the proposition is a consequence of basic inequalities 
on the spectral radius of a matrix. ■ 
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a)(g-l)K<-l b) (q — 1)k 6 (—1,0) and c) {q - l)n > 1 

(g-l)«€(0,l) 

Fig. 4.1: Different possible graphs of f{rj) versus r\ depending on the value of k = 
6(1 1 1) — 6(1|0): (a) one unstable fixed point, (b) one stable fixed point and 
(c) one unstable and two stable fixed points. 

Remark For a totally symmetric graph with connectivity q, (|4.11|) reduces to 

^ ^ b(0|l)+^6(l|l) 
' JU) 6(0|0)+t/9- 1 6(1|0)' 

and the classification in terms of 6(1 1 1) — 6(1 1 0) is pictured in Figure I4TT1 Note that 
6(1 1 1) — 6(1 1 0) > (resp. < 0) corresponds a ferromagnetic (resp. anti-ferromagnetic) 
system. 

If one considers the dual graph formed by function nodes, where links relate 
pairs of function nodes having a variable in common, then on this graph the Ja- 
cobian matrix has the structure of the incidence matrix A already encountered in 
the preceding section. This matrix is not symmetric, but eigenvalues greater than 1 
in modulus indicate anyway an instability. These are obtained by forming the new 
matrix J(A) = J — XI with / the identity matrix and finding roots of 

det J(A) = 0. 

The expansion of det J(A) involves permutations which are compatible with circuits 
of the dual graph, where each vertex is visited once. Each permutation is uniquely 
represented by a product of permutation cycles (orbits) with disjoint support and 
is attached to a sub-graph of the dual graph. Let us call maximal permutation, a 
permutation such that the complementary graph of its associated sub-graph is cycle 
free. Adapting results of [6], det J(A) may be expanded according to the following, 

Proposition 4.6. 

detJ(A) = ^ J] (detwi + (-A)^l), (4.12) 
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where the sum runs over all possible maximal permutations a, each one being ex- 
pressed as a product of n > 1 circular permutations (cycles) u>i,i = 1 . . . n, of size 
\u)i\, with determinant given by 

deta/ = -(-l)H J] (Ml|l)-6 a/3 (l|0)). 

On a tree, as expected, zero is the only eigenvalue, in fact J is a nilpotent matrix 
of index the size of the longest directed path in the graph. If there is only one cycle 
u>, (|4.12p reduces to 

detJ(A) = X N ~^(detu; + {-X)^), 
which yields the eigenvalues 

a*=( n (^(iio)-^(iii)))% (2fc+i)wM , 

with modulus obviously smaller than one. As a consequence, the following proposi- 
tion holds. 

Proposition 4.7. BP fixed points for a graph containing at most one oriented loop 
are stable. 

This has been remarked by different means in [7J. Unstable modes correspond 
to eigenvalues larger than 1, and might reveal vertices or cycles mostly responsi- 
ble for the instabilities. An interesting case occurs when cycles of the dual graph 
have disjoint supports, because then only one maximal permutation a exists and 
expansion (|4.12p reduces to one term, 

detJ(A)= Yl (deWi + (-A) |Wl1 ). 

As a result, since the modulus of the Jacobian coefficients are always smaller than 
1, to each cycle is associated an eigenvalue smaller than 1 and the state is stable. 

On a graph which is locally a tree (Bethe lattice), a mean- field equation can 
be used to evaluate the stability of a given fixed point. The idea is to consider 
the iterated Jacobian matrix in a statistical manner, by looking at the distribution 
p( n \v) of components v of an iterated vector starting from a non-degenerate initial 
condition, 

y(n) _ jny(O) _ 
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The mean-field stability equation then simply reads (after assuming the usual inde- 
pendence property of parent messages) 



with Q the connectivity distribution in the dual graph and R the Jacobian coef- 
ficient distribution (see Figure 15.31 for example) . The instability is therefore fully 
characterized by the statistical properties of the considered BP fixed point and by 
the statistical properties of the graph (connectivity), which sometimes can be an 
adjustable parameter. 

5 Toy Model simulations 
5.1 From theory to practice 

We illustrate these ideas on a simulated traffic system which has the advantage to 
yield exact empirical data correlations. For real data, problems may arise because of 
noise in the historical information used to build the model. This additional difficulty 
will be treated in a separate work. 

The model consists of a queueing network system. Each queue represents a link 
of the traffic network (a single-way lane) and has a finite capacity; to each link we 
attach a variable p £ [0, 1], the car density, which is represented by a color code in 
the picture (Figure E2] on page [23]). 

As already stated in Section [21 the physical traffic network is replicated, to form 
a space time graph, in which each vertex a = (£,t) corresponds to a link £ at a 
given time t of the traffic graph. To any space-time vertex a, we associate a binary 
congestion variable r a £ {0, 1}. 

The statistical physics description amounts to relating the probability of satu- 
ration P(r a = 1) to the density p a . For the sake of simplicity, we consider a linear 




(4.13) 
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Fig. 5.1: Structure of the factor graph (left), 3 time-layers portions are represented, 
black circles correspond to crossroads and blue squares to factor-vertices. 
Corresponding graph for the Jacobian matrix (right). 

relation and build our historical p according to the rules 

Pa(l) = Poo(pe), 
p a (0) = /ioo(l - pi), 

PapO-, 1) = Poo(pePe'), 
PaffO-, 0) = Poo{pe(l - pt')), 

where fioo is simply a frequency estimator. Note that, to follow some realistic sta- 
tistical constraints, we use here only data aggregated in time. More realistic data 
collection and modeling would work the same way. 

The structure of the factor-graph on which we propagate the information is 
depicted in Figure 15. 11 

Some fine tuning is required to let the algorithm work correctly. First, from 
Proposition 14.51 we know that the stability of the reference point p encoded in 
(|3.6p - (|3.7|) is not guaranteed; this may be evaluated on the basis of distributions 
depicted in Figure l5T3| using equation (14. 13p . In absence of negative correlations, it 
is likely that our system is either a paramagnetic-like (in the Ising-model terminol- 
ogy) system, with small fluctuations around average values, or a ferromagnetic-like 
system, in the sense that positive correlation drive the system to a state where links 
are in a similar state, i.e. mostly fluid (low state) or congested (high state). This 
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scenario corresponds to the regimes pictured on Figure 14.11 where case (c) is the 
usual ferromagnetic phase transition in the Ising model. It is also a well-known fact 
that this transition is driven by the temperature. To introduce the equivalent of a 
temperature in our equations, since its effect is essentially to reduce correlations, let 
us consider modified pairwise marginal laws 

Pap(e) = CPafj + (1 - t)PaP(3- 

The high temperature regime corresponds here to e — > and the vanishing of the cor- 
relations. The Jacobian coefficients K a ^ = 6 a( g(l|l) — 6 a/ g(l|0) are modified according 
to 

K a p{e) = eK a /3, 

which means that eigenvalues are rescaled by a factor e. For our purpose, this 
provides us with an adjustable mean-field parameter, to correct some artificial am- 
plification of correlations caused by closed loops in the graph. We expect that there 
exists a critical value of e c corresponding to the ferromagnetic phase transition point 
(high temperature means here small e). In addition, since for small e we recover in 
one sweep the bare mean results, this parameter can be used for a simulated anneal- 
ing procedure, by letting it converge from zero to the desired value during the BP 
iterations. 

The second adjustment concerns the encoding of real-time information. The 
probe vehicle is assumed to send an information for some space-time vertex a*, 
typically in the form of an instantaneous velocity, from which is estimated the prob- 
ability p a * of saturation. Instead of projecting this information on one of the two 
states (r a * = or r a * = 1), which turns out in practice to be too coarse, we use 
a procedure which amounts to bias the messages sent by a* in proportion to the 
observed belief p a * . In the statistical physics language, this amounts to impose an 
external local field on the observed variable. 

The last issue concerns the situation where the system is below the transition 
point, in which case we have two separate states, and it is always possible that BP 
converges towards the wrong fixed-point. In this simple ferromagnetic situation, it is 
in fact easy to enforce the convergence of the algorithm to a specified fixed point by 
applying a slowly decaying external field, enforcing either the fluid or the congested 
state. As a result of this procedure, we obtain two sets of beliefs {6°} and {b 1 }, 
with corresponding free energies -F and F , from which we build the superposition 
belief, 

e -F° o +e -Fi b i 
° a ~ e -F0 + e -Fi • 
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which in practice, for sufficiently large systems, because F is extensive, turns out to 
be the set of beliefs corresponding to the lowest free energy. 

In the following we refer the sets of belief {b° a }, {b l a } and {b a } respectively to the 
low, the high and the combined inference state. Accordingly the set {p a } is referred 
to as the historical state. In addition, the combination of observations with historical 
data (by replacing the historical value with the last observation in the window time) 
yields the actual state. To estimate the quality of the traffic restoration we use the 
following estimator: 



which computes the fraction of space-time nodes a for which the belief b a does not 
differ by more than an arbitrary threshold of 0.2 from p a . 

5.2 Numerical results 

We have tested the algorithm on the toy traffic network shown on the program's 
screen-shot of Figure 15.21 The characteristics of this network are summarized in 
Table 15. 1[ Two types of traffic conditions have been used, that both correspond 
to periodic oscillation superimposed with noise (see blue curve of Figure I5.4p ; they 
simply differ by the level of the noise. 

The two values of e c in Table 15.11 that have been computed for the two different 
traffic regimes using (|4.13j) are close to the observed values, which indicates that 
the space-time graph on which BP is run is close to the conditions of a dilute graph 
(Bethe lattice). 

The simulation run of Figure 15.41 compares the policies of using only the low 
state (green), the high state (red) or the combined state w.r.t. the free energy in 
the low-noise case. Abrupt changes of the combined state prediction correspond to 
the crossing of the Bethe free energies. In the transition regimes, which correspond 
to out-of-equilibrium situations, the free energy criteria sometimes select the wrong 
state. The reason for this is that the present design of our algorithm encodes only 
statistical information at equilibrium. Time correlations should be incorporated in 
some way, to encode transition rates between the macro-states (here the low and 
high traffic density). 

Distributions of performance errors shown in Figure 15.51 are based on a simula- 
tion run of 10000 traffic time units where a belief propagation is run every 3 units of 
time for both low and high inference states, to reconstruct the traffic. Varying the 
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#traceurs: 5 
#capteurs: 
traffic level: 0.52 3S 30 




Fig. 5.2: Traffic network as produced by the simulator. The continuous color code 
represents the traffic index from (full green) to 1 (full red). 
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Fig. 5.3: Connectivity (black), correlation coefficients (red) and Jacobian coefficients 
(blue) histograms for low (top) and high (bottom) noise level. 



nodes 


links 


time steps 


graph size 


e Cl (oscillating) 


e C2 (noisy) 


35 


122 
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Tab. 5.1: Toy model characteristics 
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Fig. 5.4: Reconstruction rates for the various possible inference states as a function 
of time with corresponding free energies, with 10 probe vehicles and e = 
0.75. 
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Fig. 5.5: Distribution of reconstruction errors for the various possible inference 
states for e = 0.75. 
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0,6 0,8 1 1,2 

epsilon ( ^ ) 

Fig. 5.6: (a) Reconstruction rates obtained with 0, 1, 5 and 10 probe vehicles and 
various possible inference states; (b) variance of the reconstruction rate 
obtained with 10 vehicles also for the various possible states; (c) recon- 
struction rates obtained for the noisy network, again with 10 probes. 
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Fig. 5.7: (a) reconstruction rates with various possible inference states when the 
number of probe vehicles is varied; (b) corresponding average prediction 
error. 



parameters (either e or the number of probe vehicles) and integrating the distribu- 
tions up to 0.2 yields curves of Figures and loTTl They indicate that the optimal 
value of e for traffic prediction is slightly above the critical value for the traffic oscil- 
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lating network and below the critical value for the noisy network, as expected. The 
fact that the prediction rate saturates at 0.8 when the number of probe vehicles is 
increased in Figure 15.71 is again due to the traffic transition regimes. 

6 Conclusion and perspectives 

We have presented a novel methodology for reconstruction and prediction of traffic 
using the Belief Propagation algorithm on Floating Car Data. We have shown how 
the underlying Ising model can be determined in a straightforward manner and 
that it is unique up to some change of variables. In addition, the effect of message 
normalization and the stability properties can be asserted from the original data. 
The unfortunate fact that the BP fixed point corresponding to the historical data 
may be unstable can be circumvented by rescaling of the correlations. The algorithm 
has been implemented and illustrated using a toy traffic model. 
Several generalizations are considered for future work: 

• firstly, the binary description corresponding to the underlying Ising model is 
arbitrary. Traffic patterns could be represented in terms of p different inference 
states. A Potts model with p-states variables would leave the belief propagation 
algorithm and its stability properties structurally unchanged. Actually this 
number p should be subject to an optimization procedure. 

• secondly, our way of encoding traffic network information might need to be 
augmented to cope with real world situations. This would simply amount to 
redefine the factor-graph used to propagate this information. In particular it 
is likely that a great deal of information is contained in the correlations of 
local congestion with aggregate traffic indexes, corresponding to sub-regions 
of the traffic network. Taking these correlations into account would result in 
the introduction of specific variables and function nodes associated to these 
aggregate traffic indexes. These aggregate variables would naturally lead to a 
hierarchical representation of the factor graph, which is necessary for inferring 
the traffic on large scale network. Additionally, time dependent correlations 
which are needed for the description of traffic, which by essence is an out of 
equilibrium phenomenon, could be conveniently encoded in these traffic index 
variables. 

Ultimately, for the elaboration of a powerful prediction system, the structure 
of the information content of a traffic-road network has to be elucidated through a 
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specific statistical analysis. The use of probe vehicles, based on modern communi- 
cations devices, combined with a belief propagation approach, is in this respect a 
very promising approach. 
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